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We show how to obtain a Bayesian estimate of the rates or numbers of signal and background 
events from a set of events when the shapes of the signal and background distributions are known, 
can be estimated, or approximated; our method works well even if the foreground and background 
event distributions overlap significantly and the nature of any individual event cannot be determined 
with any certainty. We give examples of determining the rates of gravitational-wave events in the 
presence of background triggers from a template bank when noise parameters are known and/or 
can be fit from the trigger data. We also give an example of determining globular-cluster shape, 
location, and density from an observation of a stellar field that contains a non-uniform background 
density of stars superimposed on the cluster stars. 



I. INTRODUCTION 

The task of estimating rates of events when a mixture 
of foreground and background events is present in data is 
a common one in physical and astrophysical applications. 
This problem comes up, among others, in gravitational- 
wave data analysis [e.g., [T}j^ and in astronomical ob- 
servations of a field of objects of mixed provenance [7]. 
In this paper, we introduce a robust formalism for es- 
timating event rates from the data when the shape of 
foreground and background distributions are known (or 
parameterized), but the provenance of individual events 
as either background or foreground is unknown. 

We use a Bayesian approach and consider all available 
data to ensure that the inferred rates are both unbiased 



and maximally constrained in the presence of limited ob- 
servations. Bayes' theorem yields the posterior probabil- 
ity density function on a set of parameters, 9, given the 
observed data, d, under a model M: 
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where p{9\M) are the prior probabilities of the model 
parameters, p{d\9,M) is the likelihood of obtaining the 
data given a particular choice of parameters, and the 
normalizing factor p{d\M) is known as the evidence. 

Two alternative approaches to rate estimation have 
been suggested and are commonly used. One, known 
as the loudest- event statistic [5l410j. uses only the infor- 
mation from the highest-ranked event in the data to in- 
fer the rate distribution. This approach has been used 
successfully [IHB] when the number of loud foreground 
events is small (typically zero or one) to obtain upper 
limits on foreground rates. However, the loudest-event 
statistic ignores all events except the loudest one, and 
so suffers from an unnecessary loss of information; there- 
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fore, we expect it to yield a much larger variance than 
strictly necessary when multiple events are present in the 
data. In practice, the loudest-event statistic is typically 
applied repeatedly to multiple "chunks" of data, using 
the estimated rate posterior from each chunk as a rate 
prior for the next chunk's analysis [2H1] • Even when used 
in this mode, the method discards information, with the 
amount of information loss depending on the (arbitrary) 
division of the data into chunks. 

Another possible approach is based on the use of only 
loud, "gold-plated" events, ones which are certain (or 
nearly certain) to come from the foreground, to derive 
rates. We refer to this approach as the foreground- 
dominated statistic. The foreground-dominated statis- 
tic may yield accurate results when the foreground and 
background are cleanly separated, at least for the loud- 
est events, and the number of such loud events is suf- 
ficiently large. However, it cannot properly account for 
marginal events. In addition, the results of the method 
are very sensitive to contamination by the background 
events, and therefore the method requires a careful choice 
of threshold or reliable membership information to distin- 
guish foregrounds and backgrounds for individual events. 
While either the loudest-event statistic or the foreground- 
dominated statistic can approach the accuracy of our pro- 
posed method in specific regimes, both are suboptimal in 
a general case. 

Ref. [llj considered the problem of determining an in- 
trinsic rate and population parameters in the presence of 
missing data, either due to thresholding, poor sensitiv- 
ity, or contamination from noise events. The approach is 
complementary to ours: we consider the problem of ac- 
curately counting the events of different classes present 
in a dataset, while Ref. [TT] deals with translating such 
counts into physical rates by properly accounting for the 
selection effects on the data set. 

In order to demonstrate our method, we consider three 
different examples. The first two come from the field of 
gravitational-wave data analysis, but could equally arise 
in any application that employs matched filtering [T2] to 
extract weak signals with known shapes from the data. 
The last example considers the case of a globular cluster 
on a background of field stars. Throughout, we compare 
the results obtained with our technique to the loudest- 
event and foreground-dominated statistics, which make 
use of a limited subset of the available information. 



II. MODEL 

We first consider one-dimensional data, but will gen- 
eralize to the multidimensional case below. We assume 
that we are presented with a data set of events that ex- 
ceed a pre-specified threshold in ranking statistic, Xmin- 
Each event may be due to either a signal of interest or an 
uninteresting background. Each event is associated with 
a ranking statistic, x. Our data set therefore consists of 



the ranking statistics for the set of events: 
d = {x,\i = l,...,N} . 



(2) 



The number of events N is also part of the observed 
data, but we separate out N and the observed ranking 
statistics, d, for convenience. We can choose how to label 
our events. Ultimately we will label the events in order 
of ranking statistic, i.e., xi < X2 < ■ ■ ■ < xn, but some 
of the derivations that follow are simpler if the events are 
ordered by time of arrival (i.e. randomly with respect to 
the Xi). We will use d to denote ranking statistic-ordered 
events, and dto to denote time-ordered events. 

We assume that both the foreground and background 
events are samples from an inhomogeneous Poisson pro- 
cess with respective differential rates 



dN 



f 



and 



dx 
dx 



bix,9), 
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where the 9 argument represents additional "shape" pa- 
rameters that may affect the distribution, and for which 
we will eventually fit. The cumulative rates of the two 
processes are therefore 



and 



F{x,( 



B{x, 
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dsb{s,9). 



(5) 
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The assumption that the foreground and background 
events form an inhomogeneous Poisson process implies 

1. The number of events in any range of ranking 
statistics, x e [a;i,a;2] is Poisson distributed with 
rate F{x2,9) - F{xi,9) or B{x2, 9) - B{xi,9). 

2. The numbers of events in non-overlapping ranges 
of ranking statistics are independent. 

3. The probability of exactly one foreground event be- 
tween x and a: -|- /i is given by 

P(n = 1 e [x, x + h])^ fix, 9)h + O (h^) . (7) 

and similarly for background events. 

4. The probability of two or more events in a small 
range of ranking statistic is negligible 



P(n = 2e [x,x + h]) ^Oih^ 



(8) 



The foreground and background rates can in general de- 
pend on several parameters; the goal of our analysis is 
to determine the posterior probability distributions for 
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these parameters that are impHed by the data. At the 
least, we will want to know the overall amplitude of the 
foreground and background rates. Let 



and 



f{x,0) = Rff{x,9'), 



b{x,9) = Rbb(x,e'), 



(9) 



(10) 



where F(oo,6l') = B{oo,0') = 1, and 9' = e\{Rf,Rb}. 
Then Rf = F{oo,9) and Rb = B{oo,9) are the total 
number of foreground and background events expected 
and f{x,9') and b{x,9') are the likelihood of obtaining 
an event with ranking statistic x under the foreground 
and background distributions. In what follows, we will 
drop the prime, using 9 to denote all parameters of the 
rate distributions except Rf and Rb- 

We do not know a priori which of the events are fore- 
ground and which are background. For each event, we 
introduce a flag, /i, which is either (background) or 1 
(foreground). These "state" flags are parameters in our 
model, along with i?^, Rb, and 9. We can marginalize 
over our uncertainty in the state of any given event by 
summing posteriors over fi — {0, 1}. 

Assuming time-ordered data, dto, in the follow- 
ing, Bayes' theorem relates the posterior probabil- 
ity of the state flags, rates, and shape parameters, 
p{{fi} :Rf,Rb,9\dto,N), the hkelihood of the data, 
p{dto\{fi} ,N,Rf,Rb,9), and the prior probability of 
state flags, rates and shape parameters before any data 
are obtained, p {{fi} , N, Rf,Rb, 9): 



p{{f,},Rf,Rb,9\dto,N) 

^ P {dto\ {/»} N, Rf,Rb, 9)p{{f,} , N,Rf,Rb, 9) 
p{dto,N) 



(11) 



The normalization constant, called the evidence, 
p{dto,N), is independent of the state flags, rates, and 
shape parameters. 

Each foreground event is drawn from the probability 
distribution / and each background event is drawn from 
the probability distribution b. The events are indepen- 
dent of each other. Therefore, the likelihood of the data 



p{dto\{f^},N, Rf,R, 



n /(^* 
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n 
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This is the probability that the first observed event is a 
fore/background event (if /i = 1, 0) with ranking statis- 
tic xi and the second observed event is a fore/background 
event (if /2 = 1,0) with ranking statistic X2, etc. If the 
events are ordered by ranking statistic the corresponding 
expression is more complicated, since Xi is now the event 
from foreground or background with the smallest ranking 



statistic, etc. We will return to the statistic-ordered case 
later. 

The prior distribution can be factorized as 

p{{f,},N,Rf,Rb,9) 

= p ({/J \N,Rf,Rb)piN\Rf,Rb)piRf,Rb, 9) 

= p{{f,},N\Rf,Rb)p{Rf,Rb,9). (13) 

The probability that the i'th state flag is fi — 1 is 
given by Rf/{Rf + Rb), while the probabihty that it is 
zero is Rb/ {Rf + Rb), provided the data are time-ordered 
as we have assumed. Then 



p{{f,}\N,Rf,Rb) 



n 



Rf 



n 



Rb 



Rf + Rb 



Nf 



Rb 

Rf + Rb 



(14) 



where Nf and Nb are the numbers of foreground and 
background flags, Nf + Nb — N. Meanwhile, 



p{N\Rf,Rb)=^-^l±^e~(^f^^^\ 



(15) 



since the distribution of total event number is a Poisson 
process with rate Rf + Rb- Combining these yields the 
conditional probability of the flags on the rates: 



p{{f,},N\Rf,Rb) 



R^'R^^ 
iV! 



exp[^{Rf+Rb)]. (16) 



The last term in Eq. ( 13 ) is a traditional prior. Because 



the rate parameters enter the posterior in the same form 
as Poisson rates, we choose here the Poisson Jeffreys prior 
on rates ^3j , independent of the shape parameters 



p{Rf,Rb,9) = 



1 



RfRb 



P{0), 



(17) 



where a is a normalization constant; but of course other 
choices are possible. This choice has the advantage that 
the prior is normalizable as Rf, Rb — > 0, and the exponen- 
tials in Eq. (16) regularize the posterior as Rf,Rb — > oo. 
Putting everything together, the posterior is 

p{{f,},Rf,Rb,9\dto,N) 



p{d,,,N)m 



n ^//(^-' 



Rbb{xi,6 

{i\fi=0} 

p{9) 



exp[- {Rf + Rb)] 



yj RfRb 



(18) 



When sampling the posterior, the first term, which is in- 
dependent of the parameters of interest, can be omitted 
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and the equals sign replaced by proportionality; however, 
we have kept this term explicitly so that we can see the 
equivalence to ranking-statistic ordered data. Once data 
have been observed, there is a unique loudness ordering 
and time ordering of those events, and so there is a one 
to one correspondence between a time-ordered posterior 
P {{fi] 1 Rfi Rbi ^Mtoi and the corresponding statistic- 
ordered posterior p[{fi\ ,Rf,Rb,9\d,N), which means 
p{{h},Rf,R,,e\d,N) = p{{n},Rf,Rk,e\dto,N). 
However, the evidence p{d,N) = N\p{dto, N), since 
there are A^! ways in which N events with a given set of 
ranking statistics can be ordered in time. 

The ranking-statistic ordered posterior can be com- 
puted directly by assuming that the flags, {fi}, are 
un-observed data and treating the sets {xi\fi = 1} and 
{xi\fi — 0} as samples from an inhomogeneous Poisson 
process. For an inhomogeneous Poisson process with rate 
function r{y) (cumulative rate R{y)), the likelihood of a 
set of samples {yi} is given by 

p {{vi} \r) d^yi — P (zero events below yi) 

X P (one event between yi and yi -\- dyi ) 
X P (zero events between yi + dj/i and 2/2) • ■ • , (19) 



P {{Vi} V) = lim cxp [-R (yi)] [r (yi) + O (Syi)] 



exp[- [R {y2)~R{yi + Syi)]] 



exp [-i? (00)] . (20) 



Applying this once to the foreground samples, once to the 
background samples and taking the product, we obtain 
p{d, {/,} , N\Rf, Rb, e) and thus p{{f,) , Rf, i?,, e\d, N) = 
p{d, {/,} , N\Rf, Rh, 6) p{Rf, Rb, 9) / p{d, N). With the 
identification p{d,N) = N\p{dto, N), as justified above, 
we reproduce Eq. ([I8]). 

We can marginalize the posterior over the flags, fi, 
obtaining 

piRf,Rb,9\d,N)= Pi{f^},Rf,Rb,9\d,N) 
{/.}e{o,i}" 



cx 



Y[ [Rff{x^,9)+Rbb{x, 



exp [- (i?/ + Rb 



pjo) 

y/RfRb 



(21) 



This expression is useful if we are only interested in 
rates and not the probability that any particular event 
is foreground or background. Unlike the full posterior 
(Eq. (18)), Eq. (21) contains only continuous parameters. 
We note that the terms that depend on the overall rate 
parameters, Rb or Rf, are of the form R^~^^'^ exp(— i?f,) 
and so marginalization over either Rb or Rf can be 



achieved analytically using 



" „-i (2n-l)!! ^ 



(22) 



using the usual notation (2n — 1)!! = (2ri — l)(2n— 3) • • • 1. 
Eq. ( 18 ) is unchanged if the ranking statistic is multi- 



dimensional; in this case, the rates are 
Rf = j d!'xf{x,9) 

and 

Rb= I d''xb{x,9), 



(23) 



(24) 



where / and b are rate densities on the fc-dimensional 
space of ranking statistics. We give an example of fitting 



for multi-dimensional rate densities in S V D 



III. 



COMPARISON TO OTHER RATE 
ESTIMATION METHODS 



It is informative to relate these results to two other 
methods for estimating the foreground rate parameter — 
the loudest event statistic and the foreground-dominated 
statistic. 



A. Loudest event statistic 

If we were to include only the k loudest events in the 
posterior distribution, rather than all observed events, 
the posterior (Eq. ( jlS] )) would be modified by an addi- 
tional factor of exp[R-fF{xj\!-k+i,9) + RbB{xff_k+i,9)], 
where we have assumed events are ordered by loudness, 
so that XN-k+i is the fc-th loudest event. This term ac- 
counts for the data-dependent threshold that a loudest 
event statistic employs. 

For the usual k — 1 case [5] , the marginalized posterior 
(Eq. E^) becomes 



PhE 

X exp 



iRf,Rb, 9\d) cx (PffixM, 9) + RbKxN, 9) 

(Rf{l - F{xN, 9)) + Rb{l - B{xN, 9)) 

y/RfRb 



(25) 



where x^ denotes the loudness of the loudest event, and 
Rf and Rb are the number of events expected above our 
original threshold (so, for example, i?/(l — F{xn,9)) is 
the number of foreground events expected above loud- 
ness xn)- In the loudest event statistic paper |8], the 
authors assume the background distribution and rate are 
known, which corresponds to using a narrow prior on Rb- 
They further assume a flat prior (in the absence of other 
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experimental data) on Rf and that the foreground and 
background distributions do not depend on any unknown 
free parameters. With these assumptions, the posterior 
on Rf, Eq. (25), is modified to 



Ple 
X exp 



(i?/|rf)cx (^Rff{xN)+Rbb{^N) 

- F{xn)) + Rb{l - B{xn)) 



(26) 



Integrating over Rf gives 



P,E(i^/|rf)di?,= /^"^^""^ e-(^-^(-"))^^ 
{I-F(xn)) 



1 



\{l-F{xN))Rbb{xN) 
and so the normahsed posterior is 



PLE{Rf\d) = ^^^2"^^^ (l + - F{xr,))A 



(27) 



X exp 



^Rfil-FixN)) (28) 



in which we have defined 



A = 



{l-F{xN))Rbb{xN) 



(29) 



With the further identification ^ = Rf and e = 1 — 
F{xn), this is Eq. (14) of [8J and we have shown how their 
parameter A is related to the foreground and background 
distributions used here. 



Returning now to Eq. (25) and marginalizing over Rf,, 
we obtain 



PLE {Rf,0\d) 



b{xN,0) 



2{l^B{xN,9)) 



Rff{xN,e) 



e^^[-Rf{l~F{xN,9))^ 



(30) 



This posterior has a maximum in i?y at 



Rf = 



f{xN, 9)-{l~ FjxN, emxN, 9) + VgjxN, 0) 
4fixN,9){l~F{xN,9)) 

where g{xN, 9) = (^f{xN,9) - (1 - F{xn, 9))b{xN,' 

- 4b{xN,9)il - Fixr,,9))f{xN, 9) (31) 



and b(xpf, 9) 
hxN,9). 



b{xj\j, 9)1(1 — B(xjs[, 9)) and similarly for 



If b{xN,9) <C f{xN,9), we obtain the result (1 — 
F{xN,9))Rf « 1/2. This can be understood as the 
statement that the rate of foreground events with ranking 
statistic greater than xn, {1 — F(xn, 9)) Rf, is of order 1, 
as expected. However, b{x]y,9) = — d[ln(l — i3(x, 6'))]/da; 
and (1 — B{x,9)) —^Qasx-^ oo, so this term may be 
divergent and for many reasonable examples, we will find 
b{xN,9) ^ f{xN,9), in which case the posterior on Rf 
is peaked at 0. This issue highlights the problem with 
using a loudest-event statistic with an improper prior 
on the background rate Rb- No matter how improba- 
ble an event with x — xn is under the background dis- 
tribution, it can become likely that the event at xn is 
from the background distribution by taking the back- 
ground rate to be sufficiently large. Although this pre- 
dicts many more events with x < xn, by using only the 
loudest event we do not incorporate the information that 
no such events are seen. This problem is avoided in the 
new framework described here, since we use all events 
detected above threshold and combined rates, Rf + Rb, 
significantly greater than the total number of observed 
events are strongly disfavored. As we will see, the prob- 
lem can also be avoided in the context of the loudest- 
event framework by even very weak prior information on 
the background rate, Rb, of the kind present in nearly all 
experiments. 

This problem can be avoided in the loudest event 
framework, by including an upper limit on the rate, 
-Rmaxj in the prior for Ri,. 



The marginalized distribution for the foreground rate then becomes 



PLE iRf,9\d) 



b(xM,9) 



il-B(xN,9))l 



eTl[J{l-B{xN,9))R„ 



-Rff{xN,9)- 



/^ei-i(^^{l-B{xN,9))R,^,^^\ ^ 



il-B{xN,9)) 



(1 - B{xN,9))R^^^e 



-{l-B(xN,e)) R„ 



c^^[-Rf{l-F{xN,9))\, 



(32) 



where erf (x) is the error function, defined in the usual way erf (x) — {2/^/11) exp(— u^)du. If (1 — B{xn, 9)) i?max ^ 
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1, Eq. (32 1 can be approximated by 



PLE {Rf,d\d) 



'-h(xN.e) + Rff{xN,e) 



p{0) 



exp(-i?/(l-F(a:w,0))) 



(33) 



and if f{xj^, 9) >• Rma.J){xN, 9) we find the same result as before, (1 — F{xj^, 9)) Rf w 1/2. 



B. Foreground dominated statistic 

If we set the threshold for including an event, Xmin, 
sufficiently high, we can ensure that f{xi,9) ^ b{xi,9) 
for all ranking statistics Xi in the data set. If we can 
further be confident that Rff{xi,9) » Rbb{xi,9) for all 
events, then the posterior can be approximated by 



PFD {Rf,Rb,0\d) 



« n [/ (2^*' ^)] exp [- {Rf + Rb)] 



^ RfRb 



(34) 



Note that these are posteriors on the number of events 
expected above the threshold Xmin- The threshold choice 
for the foreground-dominated statistic could be differ- 
ent from the threshold choice applied elsewhere. If rates 
Rf,i and i?/,2 are estimated for thresholds Xnun = xi, 
Xmin = X2 respectively, these rates should be compared 
by equating - F{x2,9)) and i?/,2(l - F{xi,9)). 

Normalization over Rb gives a constant factor and the 
posterior on the foreground rate becomes 



PFB {Rf,0\d) 



oc 



R ^ exp[-Rf]p{9). (35) 



Ignoring the dependence on 9, this is peaked at a rate 
Rf = N—1/2, so we have the expected result that, in the 
foreground dominated regime, the rate is approximately 
equal to the number of events observed (the 1/2 comes 
from our use of the Jeffreys prior on the rate). 



IV. THRESHOLDING 

This paper is concerned with Bayesian rate estimates 
based on lists of events. Ideally, the lists should contain 
all events in the data set. However, for experimental 
or computational reasons one may wish to restrict the 
events to only those above some loudness threshold; in 
some cases the rate of foreground or background events, 
or both, is even expected to diverge at certain loudnesses. 
In this subsection we address the question of how the rate 
estimate depends on the threshold value. For a discussion 
of selection effects, of which thresholding is but one, on 
the estimate of physical rates, see Ref. pT| . 

To begin with, we recall the well-known fact that the 
Bayesian estimator is unbiased, in the following sense. 
For simplicity, assume that the model consists of a single 



I 

rate parameter R, with prior distribution p(i?) . Consider 
an ensemble of data sets whose distribution is consistent 
with that prior; i.e., such that p{d) is given by 



p{d) = J p{d\R)p{R) dR. 



(36) 



For each data set in the ensemble, compute the 
Bayesian estimator for the mean of the posterior Rb = 
J Rp{R\d)dR. Then it is immediate that 

J RB{d)p{d)d{d) ^ J Rp{R)dR, (37) 

i.e. the data-weighted average of the Bayesian estimate 
Rb equals the prior-weighted average R. Therefore all 
threshold values will yield, on average, the same esti- 
mate of the rate. However this equality of averages does 
not imply that all threshold values yield the same infor- 
mation. In general, as the threshold is lowered to include 
more events, the error bar on the estimate shrinks. In 
this subsection we give quantitative illustrations of how 
the error bar shrinks when the threshold is lowered. 

Consider the following model problem. Let p{x) — 
b{x) + f{x) — Rbh{x) + Rff{x) be the rate density of 
events (of both foreground and background type) per unit 
loudness. Here we will assume that the background is 
normally-distributed in loudness, so that b has the form 



b{x) = Tb exp 



(38) 



We find it useful to define as the loudness such that a 
data set will have on average a single noise event louder 
than xi] i.e., such that 



poo 

/ b{x)dx = Rb - B (xi) = 1. 



erfc ( ^ 



This condition fixes 
Tb = 

while Rb will depend on the threshold, Xth_, as 

Rb 



rfc 



(39) 



(40) 



(41) 



Let the foreground distribution follow a power law in 
loudness (this is, for example, the distribution of SNR 
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for gravitational wave events from uniformly-distributed 
sources in a single detector) 



/(x) = 3rA, 



(42) 



where Tf — Rf — F (xi) is the mean number of fore- 
ground events with x > Xi. The overall foreground rate 
is given by 



Rf = r 



th 



We can write the full p{x) as 
-, -1 



p{x) 



erfc ( ^ 
2 \V2 



exp 



21 



(43) 



^'^f-, (44) 



For any pair (xi, F/), it is straightforward to construct 
random event lists drawn from the corresponding p{x)^ 
and straightforward to apply a threshold by "throwing 
away" all events with x less than the threshold value 
Xth- If ^ 1, then we are in the foreground-dominated 
regime at x = Xi, if Fy ^ 1 we are in the background- 
dominated regime, and if Fy ^ 1 the foreground and 
background counts above xi are about equal. For any 
thresholded event list, we use Eq. (21) to construct the 



probability density p(i?j|(i). For that event list, we define 
the foreground rate uncertainty, Ai?/, by 



(Ai?/) 



2 _ 



{Rf~RTyp{Rf\d)dRf 



(45) 



where is given by Eq. ( 43 ) . 

Figure[l]illustrates how the mean fractional foreground 
uncertainty, (Ai?/) /i?/, varies with the threshold value 
Xth for the foreground-dominated and comparable-rate 
regime. In all cases we assumed that xi = 8. For 
large thresholds, where i?f, ^ 1, increasing the thresh- 
old tends to increase the fractional uncertainty on the 
foreground rate, since fewer foreground events are in- 
cluded in the sample. However, as the threshold passes 
into the background-dominated regime, the uncertainty 
in the foreground rate asymptotes to 



Ai?/ 



R 



f 



(46) 



which is the usual Poisson counting uncertainty on the 
events that stand out from the background (those with 
X > Xi). Note that this uncertainty applies even when 
the total number of background events is orders of magni- 
tude larger than the number of foreground events. When 
a threshold must be chosen, it is safest — in the sense of 
producing the minimal foreground rate uncertainty — to 
choose the threshold well into the background- dominated 
loudness regime; the extra background events in the data 
set do not affect the estimate of the foreground rate, and, 
when the background distribution is parameterized, can 
help to better determine these parameters (see § V B ) . 
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FIG. 1. The mean foreground rate uncertainty, Eq. (45 1, 
as a function of threshold for data sets with F/ = 100 (solid 
line), Tf = 10 (dashed line), and F/ = 1 (dash-dotted line). 
Recall that Tf is the mean number of foreground events 
above xth = a;i = 8. The total background rate, Rb{xth), 
is shown by the dotted line; we fix Rt {xth = xi = 8) — 1, 
so on average there is one background event above a:: = 8. 
For Xth ^ xi, increasing the threshold tends to increase the 
foreground rate uncertainty because the rate is foreground- 
dominated and fewer events are included in the data set. For 
Xth ^ xi, the background rate dominates at small loudness, 
and the foreground rate uncertainty asymptotes to the count- 
ing error on the events that stand out from the background, 
ARf/Rf ~ 1/^. 



Though we have only illustrated the behavior of the 
rate estimate quantitatively for this specific example of 
foreground and background rates, the conclusions hold 
in general. Consider the Fisher information matrix for 
the posterior distribution in Eq. (21 1. For a model with 



parameters {9i}, the Fisher information matrix has com- 
ponents 



_ / d log p{e\d)d log p{e\d)\ 



89, 



7 



(47) 



where the average is taken over the data distribution at 
fixed 9, p{d\6). The components of the Fisher informa- 
tion matrix describe the maximum amount of informa- 
tion about the corresponding parameters available in a 
given data set; the inverse of the Fisher information ma- 
trix gives the Cramer-Rao bound on the covariance ma- 
trix of unbiased estimators of 9. For the likelihood that 



enters Eq. (21 1, the Fisher information matrix is 



F^{Rf + Rb) 
( 



f 

Rff+Rb'b 

fb 



fb 

(%/+flt,6) 



(48) 



\\{B.ff+Rtb)'- / \\Rff+Rbb, 

where the expectation values are taken over the distri- 
butions / and b (i.e. they are expectations for one event 
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from the combined rate distribution) . If the cross-terms 
are smaU, then the Cramer-Rao bound on the uncertainty 
of i?/ will be given by 



y/Rf + Rb 




-1/2 



(49) 



Extending a threshold into regions where the factor 



/ 



Rff + Rbb^ 



(50) 



becomes small — that is, into background-dominated 
regions — contributes little to reducing the overall uncer- 
tainty in the foreground rate. Thus, when the back- 
ground distribution itself is of no interest and compu- 
tational costs are high, the threshold does not need to be 
pushed into background-dominated regions in order to 
obtain an accurate foreground estimate. This is consis- 
tent with the behavior of the specific example in Figure 

m 



A. Extreme Sensitivity of the LE Rate Estimate to 
a Single, Unusually Loud Event 

Here we discuss a very unattractive feature of the 
Bayesian loudest event estimate of R : a small percent- 
age of the time it will yield a very large over-estimate. 

To explain this, we will use the same model as de- 
scribed in the previous subsection, and we will begin with 
a very specific example. Let T f = 1, meaning that the ex- 
pected number of actual events with x > Xi is one. Then 
there is a 1/64 chance (1 — F{xle) ~ 1.6%) that the 
loudest event will have xle > 4a:i. Consider this case, 
and let us also assume that there are no events (noise or 
actual) with xi < x < xle- 

The loudest event estimate basically "throws away" 
the information that there are no events in this inter- 
val. The maximu m o f the loudest-event-statistic poste- 
rior on Rf, Eq. (28), is at Rj 



'^r} If the 

h{l-F{xLE)) 

value of A is sufficiently high at xi (and A will be even 
greater at xle), then, for this data set, we would esti- 



mate Rf ~ — r > 64. Thus, for our assumed shape 

■< I-F(xle) 

of the foreground distribution, we will estimate the rate 
of events above xi to be 64 times the true rate! 

Now, if the true rate really were Fy = 64, then the 
expected number of events with x > Xi would be 64. So 
in this case, the loudest event estimate ignores the fact 
that there are ~ 56 — 72 "missing" events. However a 
Bayesian estimate with xth set to xi incorporates this 
information quite naturally, and so (correctly) yields an 
estimated Fy of order one. 



V. EXAMPLES 

In this section we present several examples of the appli- 
cation of our framework to various rate estimation prob- 
lems in the presence of background. 



A. Gravitational Waves with Non- Overlapping 
Templates 

Suppose we attempt to detect gravitational wave sig- 
nals in a data stream by matched filtering in the fre- 
quency domain against a set of N template waveforms 
[e.g., [12] ■ We use an extremely simplified model of 
such a search and the ensuing analysis to demonstrate 
how our framework could be used in practice. 

In our simplistic model, we suppose the data stream 
consists of stationary Gaussian noise with a power spec- 
tral density S{f) combined additively with some number 
of gravitational wave signals. We assume that the signals 
are sufficiently rare that they do not overlap in the data 
stream. The signal-to-noise ratio (SNR) of a template, 
h{f), given data, d(/), is 

{h,d) 



Ph 



(51) 



where (•) denotes the noise-weighted inner product: 

a* (/)&(/) 



poo 

,b) = m df 

Jo 



S{f) 



(52) 



We suppose for simplicity that the templates are suffi- 
ciently distinct that 



{hi, hj 



(53) 



In the following subsection, we will generalize the model 
to overlapping templates. We rank candidate events by 
their maximum SNR over the entire template bank, 

X = max/9/j, (54) 

h 

and consider only events that have a maximum SNR 
above some threshold, x > x^in- 

For a data stream of pure noise, d{f) — n{f), the SNRs 
of the templates are independent N{0, 1) random vari- 
ables. The background ranking statistic (i.e. the maxi- 
mum SNR over the template bank) then has a cumulative 
distribution without thresholding of 



1 +erf 



B{x) = 



N 



(55) 



where erf(a:) is the error function as before. Imposing 
the threshold, x > x^in, the cumulative distribution of 
the background becomes 

J' fa))" ,50) 



1 + erf 



(^)) 
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FIG. 2. The cumulative distribution of the ranking statistics 
for the synthetic data used to test the formalism on the model 
from §V A| The solid line gives the cumulative distribution 
of the synthetic data; the dashed line gives the theoretical 
cumulative distribution for the models in Eqs. (56 1 and (571 
combined with Rf = 10.4 and Rt, = 95.1. 



for X > Xmin, otherwise. 

The SNR of a gravitational- wave signal in an interfero- 
metric detector scales as 1/d [13] , where d is the distance 
to the source. Ignoring cosmological effects, the num- 
ber of sources scales as cfi. Thus, we expect that the 
foreground cumulative distribution of events will follow 



suits for i?max = oo, i.e., the improper prior, and 
^max = 10000. The results for other reasonable choices 
of i?max = 100, 1000, 100000 etc. gave exactly the same 
posterior, since b{xN)Rynax ^ /(s^at) for all these choices 
and we are therefore in the regime where the posterior is 
insensitive to i?max- To apply the foreground-dominated 
statistic we must specify a threshold above which we as- 
sume all events are foreground. It is reasonable to do 
this based on a specification for the relative probability 
of an event being fore/background, f{x)/b{x) = pthrosh- 
Setting pthrcsh = 0.99 gives Xmin = 4.07 and there 
are TV = 18 (11 foreground and 7 background) events 
exceeding that threshold. Setting pthrcsh = 0.5 gives 
a^min = 3.82 and there are N — 30 (11 foreground and 
19 background) events exceeding that threshold. Each 
of these thresholds gives a biased estimate of the rate 
because there are background events still above thresh- 
old. The "omniscient" threshold of Xmm = 4.38 pro- 
duces N = 7 (7 foreground and background) events in 
this data set, and therefore an unbiased estimate, but of 
course this threshold can only be determined because we 
can examine the synthetic foreground and background 
data samples. The threshold may seem obvious from a 
visual examination of Figure|4j however, the construction 
of this figure relies on the application of the full frame- 
work in the first place. We show results for the first two 
choices of Xmin in Figure [Sj the omniscient choice pro- 
duces essentially the same posterior as our full analysis. 



F{x) = 1 



(57) 



Note that this scenario has no shape parameters 9 for 
the foreground and background distributions. 

To demonstrate the effectiveness of our formalism, we 
applied it to a synthetic data set with foreground and 
background distributions drawn from Eqs. (561 and (57) 
using Xmin = 3.5, with = 10.4 and = 95.1 and 
1000 templates. The synthetic data consisted of 13 fore- 
ground events and 85 background events; the cumulative 
distribution for the ranking statistic of the synthetic data 
appears in Figure [2j We used a Markov Chain Monte 
Carlo simulation to draw samples of state flags and rates 



from the joint posterior (Eq. (18l). 

In Figure |3j we show the marginalized posterior den- 
sities for the foreground and background rates (see 
Eq. (21 )). Figure |4] shows the posterior foreground prob- 
ability for each event marginalized over all other events' 
types and the foreground and background rates. 

We can compare these results to results obtained using 
the two approximations described earlier, the loudest- 
event statistic and the foreground-dominated statistic. 
The marginalized distribution for the foreground rate us- 
ing these alternatives are shown in Figure [Sj In this 
case, the loudest event had xjy — 9.47. The loudest- 
event statistic depends on a specification of the max- 
imum, i?max, for the background rate. We show re- 



The loudest event statistic with the improper prior 
gives, as expected, a poor approximation to the fore- 
ground rate. The peak is more accurately located when 
a prior maximum rate is defined, but the distribution is 
much wider than using the full analysis described here in 
any case. This is to be expected as much of the informa- 
tion is being thrown away. The foreground-dominated 
statistic gives a reasonable approximation to the true 
foreground rate, and a distribution that is essentially 
equal to the full analysis, for the "omniscient" choice of 
threshold value that excludes all background data. For 
lower thresholds, even for a threshold where pthrosh = 
0.99, it performs poorly since we are approximating the 
foreground rate by the total foreground plus background 
rate. This indicates that, provided the threshold is cho- 
sen appropriately, the foreground dominated statistic can 
perform quite well at estimating the rate — but choosing 
this threshold correctly is difficult. The fact that it re- 
produces the posterior from the full analysis so well is 
indicative of the fact that most of the information about 
the foreground comes from the loudest events. The full 
analysis naturally incorporates inference about the back- 
ground rate Rb along with the foreground rate and in- 
corporates maximum information from the data set and 
should therefore lead to narrower posteriors in general. 
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FIG. 3. The marginalized posterior densities for Rf (solid 
line) and Rb (dashed line) for the analytic model discussed 
in |V A| The vertical lines indicate the "true" values used to 
generate the synthetic data set. Both the true foreground and 
background rates lie well within the probability envelope for 
Rf and Rt- 
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FIG. 4. Foreground probability for each event in the syn- 
thetic data set of §V A| marginahzed over all other parameters. 
True foreground events are in dark grey, background events 
in light grey. Even though our method cannot identify the 
status of most events with confidence, it can still correctly 
estimate the rates (Figure [3|. 



Gravitational Waves With Overlapping 
Templates 



In §V A| we assumed that the overlap between differ- 
ent templates in the template bank was negligible, so the 
SNRs recovered by different templates are independent 
random variables. In fact, template banks are not con- 
structed in this way [e.g., [HI [16], because signals could 
fall in the gaps between the non-overlapping templates. 
We can model this effect by assuming that a template 



FIG. 5. Posteriors on foreground rate obtained using the 
method described in this paper, the loudest event statistic 
and the foreground dominated analysis for the data set from 
§V A[ For the loudest event statistic, we present the posterior 
with and without an upper limit on the background rate, Rb; 
in both cases the rate posterior is significantly wider than the 
one obtained with the method described in this paper. For 
the foreground dominated statistic, the limits Xmin = 3.82 
and Xmin = 4.07 give likelihood ratios of f /b = 0.5 and 0.99. 
For this data set, the thresholds in fact include 19 and 7 
background events, respectively, so the corresponding rate es- 
timates are significantly biased. An "omniscient" threshold 
of a;min = 4.38 would produce exactly 7 foreground and zero 
background events, and the resulting posterior is essentially 
indistinguishable from the curve for the full analysis. 



bank of N actual templates will behave as if it had N^e 
independent templates. Rather than pre-computing A'cff , 
we can fit for it as a shape parameter. That is, we assume 
that 9 = {A'cff } is a shape parameter for the background 
cumulative distribution: 



1 + erf 



V2 



Wei 



- 1 + erf 



V2 



Wei 



2^-« - f 1 + erf 



V2 



(58) 

Results from such an analysis appear in Figures [6| and 
[7] We use the same parameters and data set as in A[ 
with Xmin = 3.5, Rf = 10.4, Rb = 95.1, and iVeff = 1000, 
but now allow N^ff to be a parameter of the background 
distribution, with a flat prior. Both the rates and the 
number of effective templates are recovered without sig- 
nificant loss of accuracy relative to the fixed N^s situation 
in fVAl 

If we consider the two alternative methods, the loud- 
est event and foreground dominated statistics, and ap- 
ply the same foreground-dominated thresholds as before, 
we will recover the same foreground distributions as are 
shown in Figure [5j This is because the parameter TVeff 
affects only the background distribution, to which the 
foreground-dominated statistic is insensitive, and in the 
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FIG. 6. The foreground (solid lines) and background (dashed 
lines) rate posterior, marginalized over all flags and the NcB 
parameter, for the gravitational wave template detection sce- 
nario with overlapping templates discussed in §VB| The true 
values of the rates, Rf — 10.4 and Rb = 95.1, are indicated 
with vertical lines. The distributions are not significantly 
wider than those of Figure |3j in spite of the extra parame- 
ter. 




FIG. 7. The posterior on the number of effective templates, 
A'^eff, for the model and data discussed in §V B[ marginalized 
over all state flags and rates. The true value, Nch = 1000, is 
indicated by the vertical line. 



loudest event case, after marginalization over iVeff we 

find J^"'''''b{xN,Ncs)dNcff < 3iVmax/-Rmax/(a;Ar, iVeg) 

and so we are still in the foreground-dominated regime 
in which the loudest event tells us nothing about the 
background. Neither of these alternative methods can 
inform us about the value of iVcg, a property of the 
background. Moreover, the choice of threshold value for 
the foreground-dominated statistic becomes significantly 
more complicated in this case, since pthrcsh now depends 
on iVoff. 



Uncertainty in the Foreground and Background 
Distributions 



The framework outlined above relies on the existence 
of models for the foreground, f{x,0), and background, 
b{x,6), distributions parameterized by a small number 
of model parameters, 9. While in many situations sim- 
ple analytic functions such as power laws will provide an 
adequate description, this will not always be the case. 
In the absence of a good analytic model, the space of 
the ranking statistic x could be divided into bins and 
f{x) and b{x) are taken to be flat in each of these bins. 
The number of free parameters characterizing each of / 
and b is then the number of bins used. While such a 
framework is model free, the increase in model parame- 
ters will mean that more observed events will typically 
be required to achieve the same precision on the rates 
and foreground/background distributions. 

In the context of gravitational wave experiments, addi- 
tional information on the ranking statistic distributions 
for the foreground can be obtained using mock signal 
injections into the data, while distributions for the back- 
ground can be estimated by analyzing time slides of data 
sets from different detectors relative to each other [e.g., 
I17j . This information can be readily incorporated in the 
current framework by assuming there is another set of Nj 
events with ranking statistics {wi}, known to be drawn 
from the foreground distribution (/; = 1) and a set of Nt 
events with ranking statistics {zi} known to be drawn 
from the background distribution (/j = 0). These events 
will typically not be drawn with the correct rate param- 
eters, so they do not contribute to the estimates oi Rf 
and Rh, but they do contribute an extra factor 



Ni Nt 

n ^) n 



(59) 



1=1 



to the right hand sides of Eqs (18 1 and (21 1. This ap- 



proach provides a way to incorporate extra information 
into the analysis in order to simultaneously fit for the 
shape of the background and foreground as well as the 
rates. In the limit that there are many more events in 
the timeslide and injection data set, this will reduce to 
the analysis that was described above with fixed rank- 
ing statistic distributions f(x) and b{x) given by the in- 
jection and time slide data. We note that this analysis 
makes the assumption that the background distribution 
is the same in the time slide and real data and that the 
foreground distribution is the same between the injection 
and real data. The former assumption is probably rea- 
sonable, modulo correlations of non-gravitational-wave 
origin between data in different detectors, but the latter 
relies on knowledge of the relative the astrophysical rates 
of different events, which is more uncertain. These as- 
trophysical uncertainties could be handled with a hybrid 
approach, in which injections are used to characterize 
the statistic distribution for sources of a particular type, 
while additional rate or shape parameters are introduced 
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to characterize the variation in the astrophysical rate of 
mergers as a function of source type. 



D. Star Cluster Parameters With Background 
Contamination 



Our final example concerns fitting for the location 
and shape parameters of a cluster of stars observed on 
top of a stellar background with a density gradient. In 
this example, stars are either members of the cluster 
(i.e. foreground) or background contamination, with a 
spatially varying density (i.e. our rate functions are two- 
dimensional). Our method of analysis here is similar 
to that of De Gennaro et al. , but here we marginal- 
ize over membership flags and are simultaneously fitting 
foreground and background densities (i.e. rates) and clus- 
ter properties. 

We assume that a star cluster has a Plummer surface- 
density profile [mUS], 



1 



/ -, \x-xo\^ Y 
rl ) 



(60) 



where xo is the location on the sky of the center of the 
cluster, ro is a radial scale parameter, and x = (x, y) is 
the position on the sky. We assume a square observa- 
tional domairj^ a? S [0, 1]^, and a background that has a 
density gradient at an arbitrary orientation with respect 
to the observational axes: 



1 + 7 • {x-xi/2) , (61) 
[1/2, 1/2] is the cen- 



where 7 is the gradient, and Xij2 
troid of the observational domain. 

We use simulated data drawn from our model with 
parameters 



6q = {a;o,yo,'^o,7:E,7y} = | 



11 11 

2'2'°-^^'"2'2 



(62) 



with Rf = 1000 and Rh — 10000. For this set of parame- 
ters, the average density of the background and the peak 
density of the cluster are comparable; there are an order 
of magnitude more background stars than cluster stars in 
the field. Figure [8] shows the density of stars on the sky 
and the particular synthetic data set used for this analy- 
sis. Because the peak density of the cluster is equal to the 
background density at the center of the domain, there is 
no single star in the domain that is more likely to be a 
cluster member than a background star (i.e. {fi) < 0.5 
for all stars); nevertheless, we will see that our method 
provides good constraints on the cluster parameters. 
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FIG. 8. Density contours and synthetic data for the example 
in § |V D[ The contours describe the true density profile with 
the parameters in Eq. (62 1. The points are the realization 
of this density profile used as synthetic data in § |V D| 



the 

dashed line encloses one Plummer scale radius about the true 
cluster center. Because the peak cluster density is equal to 
the background density at the cluster center, the cluster is 
barely apparent to the eye. 



To analyze our synthetic data set, we analytically 
marginalized over the state flags (i.e. cluster member- 
ship), using the likelihood in Eq. (21). We did this to 



take advantage of the emcee sampler of Foreman-Mackey 
et al. [20]) which requires all parameters to be in M. We 
applied a prior on the shape parameters that is flat in 
and 7, and an (approximately) Jeffreys prior on rp. 



P {ro) = 



(63) 



(Note that this factor of yJTif cancels with the Jeffreys 
prior on the rate, 1/ y/Rf; we have verified that the priors 
on these parameters are irrelevant to our results, as would 
be expected from the measurement of ~ 1000 foreground 
stars.) 

Figure [9] shows the posteriors for the cluster parame- 
ters. The center of the cluster, Xq, is localized to within 
about 5% of the cluster scale, and the cluster radius with 
a relative error of about 10%. In spite of the signifi- 
cant background, the cluster parameters are recovered to 
a relative accuracy consistent with the expected uncer- 
tainty from iVcff ~ i?^ = 1000 measurements. Figure 
[To] shows the posteriors inferred on the cluster and back- 
ground numbers, Rf and Rh- 



The observational domain is not infinite, so the normahzation 
of the cluster density in Eq. ( |60| is not quite correct. In our 
modeling we properly take this into account, but for simplicity 
here we ignore it. 



VI. DISCUSSION 

In this paper, we have developed a Bayesian framework 
for rate estimation when the data consists of a mixture 
of foreground and background events. We demonstrated 
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FIG. 9. Posterior density for the cluster parameters for the 
example from § VD (Left) Contours of the posterior proba- 



bility distribution for the center of the cluster, xg. The center 
{x,y) — {xo,yo) is determined to within ab out 5% of the 
structural radius of the cluster, ro (see Eq. (62l). (Right) 



Posterior density for the scale parameter for the cluster, ro. 
The true value is indicated by the vertical line (see Eq. |62[). 
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FIG. 10. Posterior densities for the number of stars in the 
cluster (Rf) and in the field (Rb) in the example from § |VD| 
Vertical lines indicate the true values (see Eq. (62i). 



the application of this framework using several examples 
from gravitational-wave data analysis in the presence of 
signatures of binary mergers and noise triggers, and as- 
tronomical image analysis in the presence of several pop- 
ulations of stars. We showed that this framework is gen- 
erally superior to both the loudest-event statistic and the 
foreground-dominated statistic. 

Through most of this paper, we have assumed that the 
shape of the foreground and background distributions is 
known, or at least can be modeled with several additional 
parameters. This is not necessarily easy to do. For exam- 
ple, in the case of gravitational-wave data analysis, the 
shape of the foreground distribution of events may de- 
pend on the details of a complex data-analysis pipeline 
as well as the astrophysical source distribution, while the 
background event distribution depends on data quality 
and may deviate significantly from the simple Gaussian- 
noise behavior modeled in section [V] Several approaches 
have been developed to accurately model both distribu- 
tions, e.g., through the use of injected signals 17 or other 
methods [21] to model the foreground distribution. How- 



ever, this is a difficult problem (e.g., because of the need 
to estimate the background at the very tails of the distri- 
bution), and will require significant future work. In Sec- 
tion |V C[ we discussed some of the possible approaches 
when the shapes of the background and foreground distri- 
butions cannot be confidently described by models with 
a few adjustable parameters. 

A further complication is that we have considered the 
rate of events in the data as products of some analy- 
sis pipeline. This rate may be different from the physical 
rate of interest, such as the rate of compact-binary merg- 
ers per unit time per unit volume which generate grav- 
itational waves, or the physical numbers of stars in the 
cluster and field populations which produce the observed 
luminosities. Again, the conversion between the two will 
depend on the details of the data-analysis algorithm and 
ranking statistic, including any selection effects [11], and 
would need to be determined on a case-by-case basis. See 
Ref. |10| for an example of such conversion when the un- 
derlying framework is the loudest-event statistic. 

Furthermore, in a practical application there could be 
multiple classes of events, not just foreground and back- 
ground. For example, we are not necessarily interested 
in the rate of gravitational-wave signals per se, but sepa- 
rately in the rate of signals from mergers of binary neu- 
tron stars and binary black holes - populations that may 
sometimes be difficult to distinguish. Our approach is 
readily extendable to this particular complication, how- 
ever. Note that it is symmetric with respect to fore- 
ground and background events (as expected, since one 
physicist's background is another physicist's foreground). 
We could relabel foreground and background events into 
other competing event classes, and further classes could 
be added in a straightforward way. However, the ability 
to distinguish classes relies on different distributions of 
their statistics. In general, rankings may need to be ex- 
tended to include other statistics in addition to the signal 
"loudness" statistic in order to indicate both event sig- 
nificance and the probability of event attribution to a 
particular class. 
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